feat: Retry middleware + Finagle RetryBudget (0.4.0 slice 1)#22
Merged
Conversation
The original scaffold's __init__.py imported Retry and RetryBudget before either class existed, which both broke ruff F401 and would have raised ImportError at module load time during the intermediate tasks (Tasks 4-6 import from httpware.middleware.resilience.budget, which triggers parent __init__.py execution). Defer the re-exports to Task 7 once both classes are defined; update the plan to reflect this. Pure dev-loop fix, no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ailures Refines _terminal so httpx2.NetworkError-family exceptions (ConnectError, ReadError, WriteError, PoolTimeout) map to httpware.NetworkError. InvalidURL and CookieConflict stay bare TransportError. Prerequisite for the Retry middleware so it can retry transient failures without retrying typos.
PoolTimeout inherits from httpx2.TimeoutException, not NetworkError, so the bucket is connect/read/write/close. The dispatch logic is unchanged (PoolTimeout was already caught by the timeout branch above NetworkError), but the docstring was misleading future maintainers and the upcoming Retry middleware author about what NetworkError actually wraps. Update spec + plan to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Distinct exception raised by the Retry middleware when the RetryBudget refuses to permit a retry. Carries last_response / last_exception / attempts. Inherits ClientError so callers catching ClientError already handle it.
… assertion Keyword-only __init__ breaks Python's default exception pickle protocol (unpickle calls cls(str_message) positionally). Mirror StatusError's pattern: add _reconstruct_budget_exhausted + __reduce__. A budget-exhausted error is the most likely error to cross process boundaries (the budget exists to protect against load), so silently losing diagnostic context via UnpicklingError would be bad. Also tightens the summary-message test from "'5' in str(exc)" to an exact-string match — the digit-in-string check would pass for many wrong messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Finagle-style: ttl=10s, min_retries_per_sec=10, percent_can_retry=0.2. Deterministic time via injected _now callable for tests.
- Add a comment to _purge explaining the strict-< window semantics ([now-ttl, now] inclusive). Prevents a future reader (or Hypothesis test in Task 5) from expecting <= and being surprised. - Tighten the test_deposit_after_exhaustion docstring — "floor (int truncation)" used the word "floor" for two different things in one sentence; rephrase to "int() truncates to 0". Pure documentation; no behavior change. 145 tests still pass at 100%. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t ttl margin - test_more_deposits_never_decreases_budget: was only varying extra_deposits with ttl/min_rps/percent hardcoded; now parameterizes all four so the monotonicity claim covers the actual claimed parameter space (e.g. percent=0.0 case where monotonicity comes entirely from the floor). - test_advancing_past_ttl_purges_deposits: replace ttl + 0.1 epsilon with ttl * 2.0. Self-evident; stays safe regardless of strategy max_value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements full_jitter_delay() per AWS's "full jitter" formulation: sleep = uniform(0, min(max_delay, base_delay * 2**attempt_index)). Injects a deterministic _random_uniform kwarg for testability. Adds tests/test_backoff.py to keep 100% coverage gate green until Task 7 adds Retry integration tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…attempt_index base_delay * (2 ** attempt_index) raises OverflowError at attempt_index >= 1024 because 2**1024 is too large for int→float conversion. Float exponentiation saturates to math.inf, which min() then clamps to max_delay. The current Retry default max_attempts=3 makes this unreachable in practice, but the function takes attempt_index unbounded, so closing the trap is a one-char fix at zero behavior cost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hypothesis writes a constants/ cache under .hypothesis/ when property tests run, which makes `just lint` (eof-fixer .) try to "fix" cache files outside the tracked tree. Add to root .gitignore alongside .pytest_cache / .ruff_cache. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers: happy path, 503-then-200, max_attempts exhaustion with PEP-678 note, idempotency gate (POST not retried by default, opt-in via retry_methods), non-retryable status passthrough (404 raised immediately). Exception-based retry, attempt_timeout, Retry-After, and budget integration follow in subsequent commits.
…__init__.py - retry.py:106: spec extracted the assert message to its own line for ruff's message-extraction rule; that line wasn't covered by the trailing # pragma: no cover. Add the pragma on the msg line too. 100% gate green again. - resilience/__init__.py: Task 7 plan Step 4 was to re-export Retry and RetryBudget; missed in the prior commit. Re-export them with __all__ so the public resilience package surface is complete. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Retry.__call__: assert last_exc is not None is stripped by python -O,
which would convert the structural invariant into an AttributeError
("NoneType has no attribute add_note") in optimized runtimes. Replace
with an if guard that raises AssertionError unconditionally.
- test_budget_exhausted_raises_retry_budget_exhausted_error: add a NOTE
warning Task 11 not to duplicate this test (it lives in Task 7 only
because the budget gate's RetryBudgetExhaustedError branch needs coverage
from the Retry-loop side).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Retries NetworkError and TimeoutError on idempotent methods. Bare TransportError (e.g., InvalidURL) is NOT retried since it escaped the NetworkError refinement in errors.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…with network test) test_retries_on_network_error already asserts len(sleeper.calls) == 1; the matching timeout test didn't, so a regression bypassing backoff on the TimeoutError branch would go undetected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps each attempt in asyncio.timeout(); maps asyncio.TimeoutError to httpware.TimeoutError. Caught timeouts count as retryable failures subject to the idempotency + attempt-count gates.
- retry.py: add inline comment to wrapped.__cause__ = exc explaining it is load-bearing on the retry path (last_exc = wrapped) where no `raise ... from ...` clause sets the cause. Prevents future "cleanup" that silently breaks exhaustion chain display. - test_retry.py: expand the pragma rationale to record how it was verified the assertions still execute (intentional break → test fails). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Parsed delay overrides backoff and is capped at max_delay. Malformed values fall back to backoff. respect_retry_after=False disables the override. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- _parse_retry_after: clamp the integer branch to max(0.0, ...) matching the HTTP-date branch. Negative Retry-After values from malformed servers silently became negative delays passed to asyncio.sleep (which treats them as 0). Explicit clamp removes the inconsistency. - test_respect_retry_after_false_ignores_header: add len(sleeper.calls)==1 assertion before indexing, matching the pattern used by test_retries_503. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verifies RetryBudgetExhaustedError field population (last_response on status path, last_exception on network path), per-instance fresh default budget, and explicit budget sharing across multiple Retry middlewares.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…orkError Completes the 0.4.0 slice 1: retry middleware + Finagle-style budget + NetworkError refinement for transient httpx2 failures.
Stale references to "Task 7" and "through Tasks 6-7" don't survive the shipped branch. Reword to describe what the file actually does (unit coverage of the pure helper; integration via Retry tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
Author
Code ReviewOverviewShips Epic 3 slice 1 (Resilience):
CorrectnessSolid:
One minor observation, not a blocker:
Project conventionsAll CI-enforced invariants respected:
Test coverage
Gaps worth noting (not blockers):
Suggestions for follow-upThese are improvements, not change requests for this PR:
Risks
VerdictShip it. Careful, well-tested, conventions-respecting work. The Finagle defaults are battle-tested numbers and the 🤖 Generated with Claude Code |
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships the first ship-unit of Epic 3 (Resilience): a
Retrymiddleware backed by a Finagle-styleRetryBudgettoken bucket, plus aNetworkError(TransportError)refinement so retry can distinguish transient failures from non-retryable ones.Retrymiddleware (from httpware import Retry) — retries on transient network errors (NetworkError), timeouts (TimeoutError), and retryable status codes (408/429/502/503/504) for idempotent methods (GET/HEAD/OPTIONS/PUT/DELETE) by default. Exponential backoff with full jitter, honorsRetry-After(seconds + HTTP-date forms, capped atmax_delay), optionalattempt_timeout=wall-clock cap per attempt, PEP-678 note on exhaustion.RetryBudgettoken bucket (from httpware import RetryBudget) — Finagle/AWS-SDK defaults (ttl=10s, percent_can_retry=0.2, min_retries_per_sec=10). Default is fresh per-Retry-instance; explicitRetryBudgetcan be shared across multiple clients.NetworkError(TransportError)— refinesAsyncClient._terminalmapping sohttpx2.ConnectError/ReadError/WriteError/CloseErrorraiseNetworkError;InvalidURL/CookieConflictkeep raising bareTransportError. Backwards-compat (catches ofTransportErrorstill work).RetryBudgetExhaustedError(ClientError)— raised when retry was needed but the budget refused; carrieslast_response/last_exception/attempts.3-1 (per-attempt timeout) dissolved into
Retry.attempt_timeout=rather than shipping standalone. Stories 3-2/3-3/3-4 land here. Remaining Epic 3: 3-5Bulkhead, 3-6 extension-slot docs.Out of scope (deliberate, per spec): per-call retry override via extensions,
Backoffprotocol,retry_on_exception=configuration, streaming-body retry (Epic 4 prerequisite — tracked inplanning/deferred-work.md).Test Plan
just test— 183 passing, 100% line coveragejust lint-ci— eof-fixer, ruff format, ruff check, ty check all cleanCLAUDE.md) verified: nohttpx2._private API, nofrom __future__ import annotations, noprint(), no global logging, no# type:/# mypy:ignoreRetry/RetryBudgetship in core, no new optional dep)RetryBudgetandRetryRetryBudgetExhaustedErrorpickle round-trip tested (kwargs-only init needs__reduce__)NetworkErrorcorrectly routes the four transienthttpx2exception subclasses (verified viatests/test_error_mapping_terminal.py)Refs
🤖 Generated with Claude Code